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Reading position of teletype braille 


I (‘Irtypo Modifications 

Til., following modifications have been made to the 
model .13 teletype (none of these prevents the return of 
the unit to the production of standard output): 

1. J lie teletype print wheel has been modified (Figures 
1 and 2) to emboss the seven braille columns. This was 
accomplished by removing the type from the wheel and 
inserting pins. Each pin corresponds to one braille dot. To 
prevent extraneous dots from appearing, the braille col- 
umns are placed m alternate type positions on the wheel. 
All columns arc* positioned on the same vertical level. The 
extension of the pins beyond the surface of the wheel has 
bmi adjusted to account for the resiliency of the platen 
the iorcc of the hammer, and the curvature of the platen.’ 

teletype platen has been covered by a length of 
surgical rubber tubing 1 in. in diameter and Jg- in. thick, 
llns provides the bucking necessary to easily emboss the 
Imulle dots without perforating the paper. 

•J. I lie ribbon has been removed. 

-1. The edge of the plastic window has been bent down 
toward the platen to allow the braille line to be read two 
■lies alter it has been formed (Figure 3). This modification 
s n.u necessary, but it is a convenience to the reader. 

■>. 1 lie wire paper guide in front of the platen has been 
f moved to provide room for modification 4. 

If the user is willing to wait a little longer (four braille 
Hies) before reading a braille line, modifications 4 and 5 
an be eliminated. Then given a braille wheel and the 
libber backing, any teletype can be converted to a brailler 


i. i e let v pe Character Subset 

'RESEX TATI ON AND BraILLER LEFT 

Column Equivalents 


Character A B 


Braille 


Character N 


Braille 


Equivalent AE AC HJ HE HC LJ LE AN 


Character l 2 


Braille 


Equivalent CG LG CJ CL CN LC 


Braille 


Equivalent NN NA GA GG 
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Regular Expression Search 
Algorithm 

Ken Thompson 

Bell Telephone Laboratories, Inc., Murray Hill, New Jersey 

A method for locating specific character strings embedded 
in character text is described and an implementation of this 
method in the form of a compiler is discussed. The compiler 
accepts a regular expression as source language and pro- 
duces an IBM 7094 program as object language. The object 
program then accepts the text to be searched as input and 
produces a signal every time an embedded string in the text 
matches the given regular expression. Examples, problems, 
and solutions are also presented. 

KEY WORDS AND PHRASES: search, match, regular expression 
CR CATEGORIES: 3.74, 4.49, 5.32 

The Algorithm 

Previous search algorithms involve backtracking when 
a partially successful search path fails. This necessitates 
a lot of storage and bookkeeping, and executes slowly. In 
the regular expression recognition technique described in 
this paper, each character in the text to be searched is 
examined in sequence against a list of all possible current 
characters. During this examination a new list of all 
possible next characters is built. When the end of the 
current list is reached, the new list becomes the current 
list, the next character is obtained, and the process con- 
tinues. In the terms of Brzozowsld [1], this algorithm con- 
tinually takes the left derivative of the given regular ex- 
pression with respect to the text to be searched. The 
parallel nature of this algorithm makes it extremely fast. 

The Implementation 

The specific implementation of this algorithm is a com- 
piler that translates a regular expression into IBM 7094 
code. The compiled code, along with certain runtime 
routines, accepts the text to be searched as input and 
finds all substrings in the text that match the regular 
expression. The compiling phase of the implemention does 
not detract from the overall speed since any search routine 
mast translate the input regular expression into some 
sort of machine accessible form. 


In the compiled code, the lists mentioned in the algo- 
rithm are not characters, but transfer instructions into 
the compiled code. The execution is extremely fast since 
a transfer to the top of the current list automatically 
searches for all possible sequel characters in the regular 
expression. 

This compile-search algorithm is incorporated as the 
context search in a time-sharing text editor. This is by 
no means the only use of such a search routine. For 
example, a variant of this algorithm is used as the symbol 
table search in an assembler. 

It is assumed that the reader is familiar with regular 
expressions [2] and the machine language of the IBM 7094 
computer [3]. 

The Compiler 

The compiler consists of three concurrently running 
stages. The first stage is a syntax sieve that allows only 
syntactically correct regular expressions to pass. This 
stage also inserts the operator for juxtaposition of 
regular expressions. The second stage converts the regular 
expression to reverse Polish form. The third stage is the 
object code producer. The first two stages are straight- 
forward and are not discussed. The third stage expects a 
syntactically correct, reverse Polish regular expression. 

The regular expression a(b | c)*d will be carried through 
as an example. This expression is translated into abc \ * ■ d ■ 
by the first two stages. A functional description of the 
third stage of the compiler follows: 

The ; heart of the third stage is a pushdown stack. Each 
entry in the pashdown stack is a pointer to the compiled 
code if an operand. When a binary operator (“|” or “•”) 
is compiled, the top (most recent) two entries on the stack 
are combined and a resultant pointer for the operation re- 
places the two stack entries. The result of the binary 
operator is then available as an operand in another opera- 
tion. Similarly, a unary operator (“*”) operates on the top 
entry of the stack and creates an operand to replace that 
entry. When the entire regular expression is compiled, 
there is just one entry in the stack, and that is a pointer to 
the code for the regular expression. 

The compiled code invokes one of two functional rou- 
tines. The first is called NNODE. NNODE matches a 
single character and will be represented by an oval con- 
taining the character that is recognized. The second func- 
tional routine is called CNODE. CNODE will split the 
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current search path. It is represented by © with one input 
path and two output paths. 

Figure 1 shows the functions of the third stage of the 
compiler in translating t lie example regular expression. 
The first three characters of the example a, b, c, each 
create a stack entry, ,S[i], and an NNODE box. 



b 


I'm. 1 


The next character combines the operands b and c 
with a VNODE to form b\c as an operand. (See Figure 2.) 



o blc 

I'm. 2 


The next character operates on the top entry on the 
stack. The closure operator is realized with a CNODE by 
noting the identity .Y* = Xi.Y.Y*, where X is any regular 
expression (operand) and X is the null regular expression. 
(See Figure tj.) 



o (b|cl* 


Fig. 3 


The next character compiles no code, but just 
combines the top two entries on the stack to be executed 
sequentially. The stack now points to the single operand 
«•(&!<■)*. (See Figure 4.) 



o-tb|c)* 


Fig. 4 

The final two characters d- compile and connect an 
420 Communications of tlie ACM 


XXODE onto the existing code to produce the final regu- 
lar expression in the only stack entry. (See Figure 5.) 



o-(b|c)*-d 


Fig. 5 

A working example of the third stage of the compiler 
appears below. It is written in Algol-G 0 and produces 
object programs in IBM 7004 machine language. 

begin 

integer procedure gel character ; code; 

integer procedure instruclion(op, address, lag, decrement)' 
code; 

integer procedure value(sf/mbol ); code; 
integer procedure index (character) ; code; 
integer char, Ic, pc; 
integer array sfocA*[0:10], code[0:300]; 
switch switch := alpha, juxla, closure, or, eof ; 

Ic := pc := 0; 
advance : 

char := gel character ; 
go to switch[index(char)]', 
alpha : 

code[pc] := instruction^ Ira* , value(‘codo’)+pc+l f 0, 0); 
codc[pc-\~ 1] := instruction^ txl* , value (‘fail’), 1, — char— 1); 
codc[pc+2 ] := instruction^ txh' , value('faiV) , 1, —char); 
codc[pc+ 3] := instruction^ tax' , value('nnode') , 4, 0); 
s/acA(/cJ := pc; ■ 
pc := pc-f4; 
ic := fc+1; 
go to advance; 
juxta : 

Ic := Ic— 1; 
go to advance; 
closure : 

code[pc] := instruction^lsx’ , value (‘cnode’), 4, 0); 
code[pc+\) := code[stack[lc— 1]]; 

code[stack[lc — 1]] := instruction^ Ira' , valuc(‘code')+pc, 0, 0); 
pc := pc+2; 
go to advance; 
or: 

codc[pc ] := instruction^ tra' , value(‘code’)+pc-\-4, 0, 0); 
code [pc +1J := instruction^ tsx’ , value (‘cnode’), 4, 0); 
code|pc+2] := code[stack[lc— 1]]; 
code[pc+3] := code[stack[lc— 2]\; 

code[stack[lc— 2]] := instruction^ tra’ , value(‘code’)+pc+l, 0, 0); 
code[stack[lc— 1)] := instruction (‘ Ira’ , value(‘code l )+pc+4, 0, 0); 
pc := pc+4; 
ic := ic— 1; 
go to advance; 
eof : 

code[pc] := instruction^ tra’ , value(‘found’), 0, 0); 
pc := pc+1 
end 

The integer procedure get character returns the next 
character from the second stage of the compiler. The 
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integer procedure index returns an integer index to classify 
the character. The integer procedure value returns the 
location of a named subroutine. It is an assembler symbol 
table routine. The integer procedure instruction returns an 
assembled 7094 instruction. 

When the compiler receives the example regular expres- 
sion, the following 7094 code is produced: 


CODE TUA CODE+1 0 a 

TXL FAIL,l,-'a'-l 1 

TXII FAIL, 1, -'a' 2 

TSX NNODE, 4 3 

TUA CODE+16 4 b 

TXL FAIL,l,-'b'-l 5 

TXH FAIL,l,-'b' 0 

TSX NNODE, 4 7 

TRA CODE+16 8 c 

TXL FAIL,l,-'c'-I 9 

TXII FAIL.1,— V 10 

TSX NNODE.4 11 

TRA CODE+16 12 I 

TSX CNODE, 4 13 

TRA CODE+9 14 

TRA CODE+5 15 

TSX CNODE, 4 16 

TRA CODE+13 17 

TRA CODE+19 18 -d 

TXL FAIL.l, — 'd' — 1 19 

TXII FAIL.l.-'d' 20 

TSX NNODE.4 21 

TRA FOUND 22 -eof 


Runtime Routines 

During execution of the code produced by the compiler, 
two lists (named CLIST and NLIST) are maintained by 
the subroutines CNODE and NNODE. CLIST contains 
a list of TSX **,2 instructions terminated by a TRA 
XCHG. Each TSX represents a partial match of the 
regular expression and the TRA XCHG represents the 
end of the list of possible matches. A call to CNODE from 
location x moves the TRA XCHG instruction down 
one location in CLIST and inserts in its place a TSX 
z+1,2 instruction. Control is then returned to x+2. 
This' effectively branches the current search path. The 
path at x+1 is deferred until later while the branch at 
i+2 is searched immediately. The code for CNODE is as 
follows: 

CNODE AXC **,7 CLIST COUNT 

CAL CLIST, 7 

SLW CLIST+1,7 MOVE TRA XCHG 
INSTRUCTION 

PCA ,4 
ACL TSXCMD 

SLW CLIST, 7 INSERT NEW TSX ..,2 

INSTRUCTION 

TXI *+1,7, — 1 

SCA CNODE, 7 INCREMENT CLIST 
COUNT 

TRA 2,4 RETURN 

TSXCMD TSX 1,2 CONSTANT, NOT 

EXECUTED 

The subroutine NNODE is called after a successful 


match of the current character. This routine, when called 
from location x, places a TSX a;+l,2 in NLIST. It 
then returns to the next instruction in CLIST. This sets 
up the place in CODE to be executed with the next 
character. The code for NNODE is as follows: 

NNODE AXC **,7 NLIST COUNT 

PCA ,4 

ACL TSXCMD 

SLW NLIST, 7 PLACE NEW TSX **,2 

INSTRUCTION 

TXI *+1,7, — 1 

SCA NNODE, 7 INCREMENT NLIST 

COUNT 

TRA 1,2 

The routine FAIL simply returns to the next entry in 
the current list CLIST. 

FAIL TRA 1,2 

The routine XCHG is transferred to when the current 
list is exhausted. This routine copies NLIST onto CLIST, 
appends a TRA XCHG instruction, gets a new character 
in index register one, and transfers to CLIST. The instruc- 
tion TSX CODE, 2 is also executed to start a new 
search of the entire regular expression with each character. 
Thus the regular expression will be found anywhere in the 
text to be searched. Variations can be easily incorporated. 
The code for XCHG is : 


XCHG 

LAC 

NNODE, 7 

PICK UP NLIST COUNT 


AXC 

0,6 

PICK UP CLIST COUNT 

XI 

TXL 

X2,7,0 



TXI 

*+1,7,1 



CAL 

NLIST ,7 



SLW 

CLIST, 6 

COPY NLIST ONTO CLIST 


TXI 

XI ,6,-1 


X2 

CLA 

TRACMD 



SLW 

CLIST, 6 

PUT TRA XCHG AT 




BOTTOM 


SCA 

CNODE, 6 

INITIALIZE CNODE 




COUNT 


SCA 

NNODE, 0 

INITIALIZE NNODE 




COUNT 


TSX 

GETCHA.4 



PAC 

,1 

GET NEXT CHARACTER 


TSX 

CODE ,2 

START SEARCH 

) 

TRA 

CLIST 

FINISH SEARCH 

* j 

TRACMD 

TRA 

XCHG 

CONSTANT, NOT 




EXECUTED 


Initialization is required to set up the initial lists and 
start the first character. _ , . 


INIT SCA NNODE, 0 
TRA XCHG 

The routine FOUND is transferred to for each successful 
match of the entire regular expression. There is a one 
character delay between the end of a successful match 
and the transfer to FOUND. The null regular expression 
is found on the first character while one character regular 
expressions are found on the second character. This means 
that an extra (end of file) character must be put through 
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tin' code in urdor to obtain complete results. FOUXD de- 
pends upon the use of the search routine and is therefore 
not discussed in detail. 

The integer procedure GETCIIA (called from XCHG) 
obtains the next character from the text to be searched 
iliis character is right adjusted in the accumulator. 
GI-.lUIA must also recognize the end of the text and 
terminate the search. 

Notes 

< ode compiled for «** will go into a loop due to the 
closure operator on an operand containing the null regular 
expression, ,\. There arc two ways out of this problem. The 
lirst is to not allow such an expression to got through the 
syntax sieve. In most practical applications, this would 
not he a serious restriction. The second wav out is to 

recognize lambda separately in rands and remember 

tbc ( ODE location ot the recognition of lambda. This 
means that it* is compiled as a search for A|<m* . If the 
closure operation is performed on an operand containing 
lambda, the instruction TUA FAIL is overlaid on that 
portion of the operand that recognizes lambda. Thus a** 
i.s compiled as Xj ua*Uui*\*. 

i lie array lambda is added to the third stage of the pre- 
vious compiler. It contains zero if the corresponding 
operand does not contain ,\. It. contains the code location 
ot the recognition ot ,\ if the operand does contain A. (The 
rmlr location of the recognition of A can never be zero.) 

inlcjrcr procedure yet character] code; 

in ^“;; r ,,r * - instruction (up, address, tag, decrement)-, 

iulc-cr procedure cnludsi/inbol ); code; 
inle«er proeedure indvx(chumclcr ) ; <•«>«!«•; 
inlejjer char, Ir, pc; 

i nlejier array slack, 1ntuUi\ 0:101, «W/-|0 : 3(XJ I ; 
suileli switch : = alpha, juxta. closure, or, ,,,/• ’ 

, U: := pc. := 0; 

f advance: 

char := </cl character', 

, | go lo swilch[indcx(char)]; 

■ alpha: 

niilc\/jc] := inslructionvtra’, mlucCcale^+pc+l, 0, 0); 

<We|pc+ll := instruction Ctrl', rain c (.‘fail 1 ), 1, -diar’-l)- 
ccifi'[/K-+2| := instrmtinn(‘lxli\ mlucffaU'), 1, -char)-, 
ciu/elpc+3] := imlructionVtsr’ , mtvrCnnodc'), 4 0)- 
r/ndr'ilr] := pc; 
laml»lu\lc] := 0; 
pc : = pc-H; 
lc := lc+ 1; 

• ! go to advance] 

| juxta : 

if land)da[lc— 1) = 0 then 
lambda\lc— 2J := 0; 
lc := lc— 1; 
go to advance ; 
j , closure : 

cmlc\pc] : = inntriictionCtsx', mlue(,'cnode') , 4,0); 
cmlefpc+l] := eor/c[>M-|fc-l]]; 

cmlcfpc+21 := instruction i.‘lra ' , mlue(,‘code’)+pc+B, 0, 0)- 
i code[pc+B\ := inslruction(‘tsx’, mlue(.'crwdc’),4,0)-' 

cndrtpc+4 ] := axfe[*tocJt[/c-ljj; 

code[pc+b\ := instnictionCtra', mlue{‘coele')+pc- (-6, 0, 0)- 
codclslackflc- 1]J : = inxtruclionCtrn’ , calue(‘code')+'pc'+ 3, 0,0); 


if lainbda[lc—l] ^ 0 then 

codc[hmbdatlc-l)) : = inslructionftra \ mlueCfaiV) 0 01- 
lambda{tc-l] := pc+o; ’ >r 

pc := pc-f-G; 
go to advance] 
or: 

code{pc\ : = instruclion(‘tra’ , valuel‘code’)+pc+4, 0, 0)- 
codclpc+l] := instruction (‘tax', value(‘cnode‘) , 4 o’)- 
co.fc|gc+2J := code[slack[lc~l]]; 
cndelpc+ 3| := code[atoctb[/c — 2J] ; 

cofte(stact-|tc— 2]] : = inslmclionflra valueCcode’)+pc+l 0 0)- 
co(lc[stack[lc-llJ : = instruction (‘tra’, valne( l cude’)-)-pc4-\ o 0) 
if lumUattc-2 ] = 0 then P i- , 0, 0) 

begin if lambda[lc—l] ?e 0 then 
lamhtla[lc—2] = lamMallc—l] 
end else 

if liwdrlallc-l] ^ 0 then 
cwle\lnmlkta\lc~ 1 1] := 

inslructionflra', mluc(‘code')+lamhda{lc-2\ 0 0)- 

pc := pc-r 4; 

lc := lc—l; 

go to advance; 
eof: 

codclpc] := instructionClra’, valued found’) , 0, 0)- 

pc := pc+l 
end 

The next note on the implementation is that the sizes 
of the two runtime lists can grow quite large. For example 
the expression a*a*a*a*a*a* explodes when it encounters 
a few concurrent o’s. This expression is equivalent to a* 
and therefore should not generate so many entries. Such 
redundant searches can be easily terminated by having 
NXODE (CXODE) search NLIST (CLIST) for a match. 6 
mg entry before it puts an entry in the list. This now gives 
a maximum size on the number of entries that can be in the 
lists. The maximum number of entries that can be in 
CLIST is the number of TSX CXODE, 4 and TSX 
XXODE.4 instructions compiled. The maximum num- 
ber^ of entries in NLIST is just the number of TSX 
XNODE.4 instructions compiled. In practice, these 
maxima, arc never met. 

The execution is so fast, that any other recognition and 
deleting of redundant, searches, such as described by Ivuno 
and Oettinger [4], would probably waste time. 

This compiling scheme is very amenable to the extension 
of the regular expressions recognized. Special characters 
can be introduced to match special situations or sequences. 
Examples include: beginning of line character, end of line 
character, any character, alphabetic character, any num- 
ber of spaces character, lambda, etc. It is also easy to 
incorporate new operators in the regular expression rou- 
tine. Examples include: not, exclusive or, intersection, etc. 
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Program for tbe Experienced User 
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Presently available online debugging routines are often 
i unsatisfactory for the experienced user because they require 
1 unnecessarily rigid and complicated typing formats, make it 
• difficult for the user to correct typing errors, and consume 
excessive memory with intricate features. In a debugging 
program it is of prime importance that the program be simple, 
I flexible, and highly efficient to use. Communication between 
I the user and the debugging program can be improved by 
using certain techniques applicable to most online debugging 


acters, but does not address himself to the problem of 
correcting typing errors. The present paper contains 
further typing shortcuts useful in debugging, and stresses 
ease of error correction. The techniques described here 
can be applied to many existing programs. They are il- 
lustrated by specific reference to OPAK, a recently written 
debugging program. The description of OPAK is limited 
to a discussion of its user-program communication, plus a 
brief outline of its features. 

In Section 2 certain communication defects found in 
many recently written debugging programs are briefly 
illustrated, and in Section 3 examples of how to overcome 
these defects are given. 

In Section 4 the balance between economy of core allo- 
cation and inclusion of elaborate features in the debugging 
program is discussed. 

2. Difficulties with Present Programs 


' programs. 

These techniques are presented and are illustrated by 
their use in OPAK (octal package), a debugging program 
! coded for the PDP-5/8 and the SDS-930. 
i The compromise between economy of utility program core 

j storage and incorporation of elegant debugging features is 
j discussed. 

j KEY WORDS AND PHRASES: debugging, utility program, programming 
I languages 

i CR CATEGORIES: 4.42 


i 1. Introduction 

The past decade lias witnessed a proliferation of small, 

| versatile digital computers, often purchased for a special 
! purpose or for use as an experimental tool. They are gen- 
! orally programmed by experienced users; any novice 
exposed to such a computer becomes experienced in a 
matter of days. The most important single utility program 
is probably the online debugging program. Debugging 
programs are in abundance — some are supplied by com- 
puter manufacturers, such as DEC’S DDT (digital de- 
I bugging tape) and SDS’s AID, and many more are 

! written by users who desire special features. Examples of 

| the latter may be found in computer user society bulletins 
| and in the literature [1, 2, 3]. 

Unfortunately, existing utility programs are often 
difficult to use because, ironically, they try to appeal to 
the novice user by requiring an elaborate verbal inter- 
course. One possible reason for this is that writers of 
debugging routines infuse system batch processing con- 
cepts into what should be a simple, elegant online program. 

The need for writing a debugging program for the ex- 
; pert is noted by Lampson [3]: “An interactive debugging 
I system should not be designed for the occasional user. 
Its emphasis must be on completeness, convenience, and 
conciseness, not on highly mnemonic commands and self- 
explanatory output.” Lampson streamlines his program 
by using single-letter mnemonics and special control char- 


Commonly found defects in the user-program communi- 
cation structure of debugging programs are: (1) too much 
user typing is required; (2) it is too difficult to correct user- 
detected typing errors; and (3) program-detected typing 
errors result in computer action bordering on the puni- 
tive — lengthy time consuming messages are typed, and 
the user often must retype whole sequences of commands 
to correct the error. Each of these defects can be illustrated 
with examples paraphrased from existing debugging 
programs. 

1. Too much typing. To aid the novice user, whole 
words must be typed as program directives (PUNCH, 
STORE, LOAD, etc.). For example, to get an octal 
dump of locations 4134 to 4200, type J3CTAL(CAR RET) 
4134 d T0 d 42OO(CAR RET). (The symbol D denotes 
space.) All words must be spelled properly, and any al- 
teration of the indicated sequence results in an error re- 
turn. 

2. Correction of user-detected errors. If the user detects 
his own typing error, he is generally required to hit a 
special key (perhaps X), and then possibly a termination 
code. This returns the program to the “wait for new com- 
mand” mode, possibly at the expense of wasting a whole 
line of good typing. The user must now take a separate 
action to type his correct sequence. 

3. Lengthy error diagnostics. It can be infuriating to an 
experienced programmer to have to sit helplessly while a 
typewriter types: ILLEGAL CONTROL WORD. 
PLEASE TRY AGAIN. In another example, one existing 
program waits for an entire line of text to be typed before 
doing any error detection, and then rejects the whole 
line if an error appears anywhere. 

3. Streamlining the Debugging Program 

In early 1964, A. D. Hause of Bell Telephone Labora- 
tories wrote a program called “Octal Package for the 
PDP-5 Computer.” It was an extremely compact pro- 
gram (less than 400 s words), which had certain elegant 
algorithms for ease of typing and error diagnosing. The 
author, while at New York University in 1965, spent 
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